Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

نویسندگان

  • Dean M. Tullsen
  • Jack L. Lo
  • Susan J. Eggers
  • Henry M. Levy
چکیده

Existing multiprocessor synchronization mechanisms are relatively heavyweight, due in part to the level of the memory hierarchy (typically main memory) at which threads must synchronize. Multithreaded processors, on the other hand, have the potential to significantly reduce synchronization cost, because threads share the processor simultaneously and can synchronize using processor-internal state. This paper proposes and evaluates new synchronization schemes for a simultaneous multithreaded processor. We present a scalable mechanism that permits threads to cheaply synchronize within the processor, with blocked threads consuming no processor resources. The basic mechanism, blocking acquire and release, is a hardware implementation of traditional software synchronization abstractions, implemented with a thread-shared hardware lock box. This study shows that a multithreading processor with efficient synchronization enables parallelization at a granularity an order of magnitude finer than required on conventional parallel machines with memory-based synchronization. When combined with lock release prediction, which reduces the overhead of restarting a blocked thread, acquire-release gains an additional improvement of 40%. Overall, we show that these substantial improvements in synchronization cost enable parallelization of code that could not be effectively parallelized using traditional synchronization; performance of these codes showed up to a two-fold improvement over single-threaded versions of the previously unparallelizable code.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Emulate Fine-grained Multithreading

Fine-grained multithreading can be used to hide longlatency operations encountered in parallel computers during remote memory access. Instead of using special processor hardware, the emulation of fine-grained multithreading on standard processor hardware is investigated. While emulation of coarse-grained multithreading is common in modern operating systems, in the fine-grained case research on ...

متن کامل

Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization a...

متن کامل

Simultaneous Multithreading: Maximizing On-Chip Parallelism - Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on

This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with altemative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing ...

متن کامل

Predictable Fine-Grained Cache Behavior for Enhanced Simultaneous Multithreading (SMT) Scheduling

By converting thread-level parallelism to instruction level parallelism, Simultaneous Multithreaded (SMT) processors are emerging as effective ways to utilize the resources of modern superscalar architectures. However, the full potential of SMT has not yet been reached as most modern operating systems use existing single-thread or multiprocessor algorithms to schedule threads, neglecting conten...

متن کامل

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of resources in modern super-scalar processors. SMT processors increase instruction-level parallelism (ILP) and resource utilization by simultaneously executing instructions from multiple independent threads. Although simultaneously sharing resources benefits system throughput, coscheduled threads oft...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999